Word Order Does NOT Differ Significantly Between Chinese and Japanese
نویسندگان
چکیده
We propose a pre-reordering approach for Japanese-to-Chinese statistical machine translation (SMT). The approach uses dependency structure and manually designed reordering rules to arrange morphemes of Japanese sentences into Chinese-like word order, before a baseline phrase-based (PB) SMT system applied. Experimental results on the ASPEC-JC data show that the improvement of the proposed pre-reordering approach is slight on BLEU and mediocre on RIBES, compared with the organizer’s baseline PB SMT system. The approach also shows improvement in human evaluation. We observe the word order does not differ much in the two languages, though Japanese is a subject-object-verb (SOV) language and Chinese is an SVO language.
منابع مشابه
Comparison of the Impact of Word Segmentation on Name Tagging for Chinese and Japanese
Word Segmentation is usually considered an essential step for many Chinese and Japanese Natural Language Processing tasks, such as name tagging. This paper presents several new observations and analysis on the impact of word segmentation on name tagging; (1). Due to the limitation of current state-of-the-art Chinese word segmentation performance, a character-based name tagger can outperform its...
متن کاملExploiting Shared Chinese Characters in Chinese Word Segmentation Optimization for Chinese-Japanese Machine Translation
Unknown words and word segmentation granularity are two main problems in Chinese word segmentation for ChineseJapanese Machine Translation (MT). In this paper, we propose an approach of exploiting common Chinese characters shared between Chinese and Japanese in Chinese word segmentation optimization for MT aiming to solve these problems. We augment the system dictionary of a Chinese segmenter b...
متن کاملBilingualism, Biliteracy and Metalinguistic Awareness: Word Awareness in English and Japanese Users of Chinese as a Second Language
Cross-linguistic research shows that some aspects of metalinguistic awareness are affected by characteristics of different writing systems. Users of writing systems that mark word boundaries (such as English) develop word awareness, while users of unspaced writing systems (such as Chinese) do not. Previous research showed that English-speaking users of Chinese as a Second Language (CSL) have hi...
متن کاملThe effect of canonical word order on the production and comprehension of pseudoclefts in L2
This study investigated the effect of word order and age on the production and comprehension of pseudoclefts in L2 across two experiments. For each experiment 16 female students aged between 179 and 210 months were recruited from a secondary school. These students were divided into two groups based on their age range; one group for investigating the effect of word order and age on the productio...
متن کاملChinese and Japanese Word Segmentation Using Word-Level and Character-Level Information
In this paper, we present a hybrid method for Chinese and Japanese word segmentation. Word-level information is useful for analysis of known words, while character-level information is useful for analysis of unknown words, and the method utilizes both these two types of information in order to effectively handle known and unknown words. Experimental results show that this method achieves high o...
متن کاملJapanese Kanji Word Processing for Chinese Learners of Japanese: A Study of Homophonic and Semantic Primed Lexical Decision Tasks
The current study investigates phonological involvement in Japanese word recognition by advanced and intermediate Chinese learners. A homophonic, semantic and unrelated (control) primed lexical decision task was used to test the participants’ reactions times (RTs) and accuracy scores. Only the RTs of the participants’ accurate YES responses in the lexical decision task (yes/no) were used as dep...
متن کامل